动物运动跟踪和姿势识别的进步一直是动物行为研究的游戏规则改变者。最近,越来越多的作品比跟踪“更深”,并解决了对动物内部状态(例如情绪和痛苦)的自动认识,目的是改善动物福利,这使得这是对该领域进行系统化的及时时刻。本文对基于计算机的识别情感状态和动物的疼痛的研究进行了全面调查,并涉及面部行为和身体行为分析。我们总结了迄今为止在这个主题中所付出的努力 - 对它们进行分类,从不同的维度进行分类,突出挑战和研究差距,并提供最佳实践建议,以推进该领域以及一些未来的研究方向。
translated by 谷歌翻译
骨科疾病在马匹中常见,通常导致安乐死,这通常可以通过早期的检测来避免。这些条件通常会产生不同程度的微妙长期疼痛。培训视觉疼痛识别方法具有描绘这种疼痛的视频数据是挑战性的,因为所产生的疼痛行为也是微妙的,稀疏出现,变得不同,使得甚至是专家兰德尔的挑战,为数据提供准确的地面真实性。我们表明,一款专业培训的模型,仅涉及急性实验疼痛的马匹(标签不那么暧昧)可以帮助识别更微妙的骨科疼痛显示。此外,我们提出了一个问题的人类专家基线,以及对各种领域转移方法的广泛实证研究以及由疼痛识别方法检测到矫形数据集的清洁实验疼痛中的疼痛识别方法检测到的内容。最后,这伴随着围绕现实世界动物行为数据集所带来的挑战以及如何为类似的细粒度行动识别任务建立最佳实践的讨论。我们的代码可在https://github.com/sofiabroome/painface-recognition获得。
translated by 谷歌翻译
Masked Image Modelling (MIM) has been shown to be an efficient self-supervised learning (SSL) pre-training paradigm when paired with transformer architectures and in the presence of a large amount of unlabelled natural images. The combination of the difficulties in accessing and obtaining large amounts of labeled data and the availability of unlabelled data in the medical imaging domain makes MIM an interesting approach to advance deep learning (DL) applications based on 3D medical imaging data. Nevertheless, SSL and, in particular, MIM applications with medical imaging data are rather scarce and there is still uncertainty. around the potential of such a learning paradigm in the medical domain. We study MIM in the context of Prostate Cancer (PCa) lesion classification with T2 weighted (T2w) axial magnetic resonance imaging (MRI) data. In particular, we explore the effect of using MIM when coupled with convolutional neural networks (CNNs) under different conditions such as different masking strategies, obtaining better results in terms of AUC than other pre-training strategies like ImageNet weight initialization.
translated by 谷歌翻译
Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections. Multiple iterations, however, increase the risk that information other than the target is negatively affected. We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space. Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.
translated by 谷歌翻译
Manually analyzing spermatozoa is a tremendous task for biologists due to the many fast-moving spermatozoa, causing inconsistencies in the quality of the assessments. Therefore, computer-assisted sperm analysis (CASA) has become a popular solution. Despite this, more data is needed to train supervised machine learning approaches in order to improve accuracy and reliability. In this regard, we provide a dataset called VISEM-Tracking with 20 video recordings of 30s of spermatozoa with manually annotated bounding-box coordinates and a set of sperm characteristics analyzed by experts in the domain. VISEM-Tracking is an extension of the previously published VISEM dataset. In addition to the annotated data, we provide unlabeled video clips for easy-to-use access and analysis of the data. As part of this paper, we present baseline sperm detection performances using the YOLOv5 deep learning model trained on the VISEM-Tracking dataset. As a result, the dataset can be used to train complex deep-learning models to analyze spermatozoa. The dataset is publicly available at https://zenodo.org/record/7293726.
translated by 谷歌翻译
The body of research on classification of solar panel arrays from aerial imagery is increasing, yet there are still not many public benchmark datasets. This paper introduces two novel benchmark datasets for classifying and localizing solar panel arrays in Denmark: A human annotated dataset for classification and segmentation, as well as a classification dataset acquired using self-reported data from the Danish national building registry. We explore the performance of prior works on the new benchmark dataset, and present results after fine-tuning models using a similar approach as recent works. Furthermore, we train models of newer architectures and provide benchmark baselines to our datasets in several scenarios. We believe the release of these datasets may improve future research in both local and global geospatial domains for identifying and mapping of solar panel arrays from aerial imagery. The data is accessible at https://osf.io/aj539/.
translated by 谷歌翻译
作为估计高维网络的工具,图形模型通常应用于钙成像数据以估计功能性神经元连接,即神经元活动之间的关系。但是,在许多钙成像数据集中,没有同时记录整个神经元的人群,而是部分重叠的块。如(Vinci等人2019年)最初引入的,这导致了图形缝问题,在该问题中,目的是在仅观察到功能的子集时推断完整图的结构。在本文中,我们研究了一种新颖的两步方法来绘制缝的方法,该方法首先使用低级协方差完成技术在估计图结构之前使用低级协方差完成技术划分完整的协方差矩阵。我们介绍了三种解决此问题的方法:阻止奇异价值分解,核标准惩罚和非凸低级别分解。尽管先前的工作已经研究了低级别矩阵的完成,但我们解决了阻碍遗失的挑战,并且是第一个在图形学习背景下研究问题的挑战。我们讨论了两步过程的理论特性,通过证明新颖的l无限 - 基 - 误差界的矩阵完成,以块错失性证明了一种提出的方​​法的图选择一致性。然后,我们研究了所提出的方法在模拟和现实世界数据示例上的经验性能,通过该方法,我们显示了这些方法从钙成像数据中估算功能连通性的功效。
translated by 谷歌翻译
轨迹预测是成功的人类机器人相互作用的必不可少的任务,例如在自动驾驶中。在这项工作中,我们解决了使用移动摄像机在第一人称视图设置中预测未来行人轨迹的问题。为此,我们提出了一种新型的基于动作的对比学习损失,该损失利用行人行动信息来改善学习的轨迹嵌入。这一新损失背后的基本思想是,在特征空间中,执行相同行动的行人的轨迹比具有明显不同动作的行人的轨迹更接近彼此。换句话说,我们认为有关行人行动的行为信息会影响他们的未来轨迹。此外,我们为轨迹引入了一种新型的采样策略,能够有效地增加负面和阳性对比样品。使用训练有素的条件变异自动编码器(CVAE)生成其他合成轨迹样品,该样品是为轨迹预测开发的几种模型的核心。结果表明,我们提出的对比框架采用了有关行人行为的上下文信息,即有效的行动,并学习了更好的轨迹表示。因此,将所提出的对比框架集成在轨迹预测模型中可以改善其结果,并在三个轨迹预测基准上胜过最先进的方法[31,32,26]。
translated by 谷歌翻译
事实证明,神经网络是以非常低的比特率解决语音编码问题的强大工具。但是,可以在现实世界中可以强大操作的神经编码器的设计仍然是一个重大挑战。因此,我们提出了神经末端2端语音编解码器(NESC),可用于3 kbps的高质量宽带语音编码的稳定,可扩展的端到端神经语音编解码器。编码器使用一种新的体系结构配置,该配置依赖于我们提出的双PATHCONVRNN(DPCRNN)层,而解码器体系结构基于我们以前的工作streamwise-stylemelgan。我们对干净和嘈杂的语音的主观听力测试表明,NESC对于看不见的条件和信号扰动特别强大。
translated by 谷歌翻译
全球抗菌耐药性(AMR)的增加是对人类健康的严重威胁。为了避免AMR的传播,快速可靠的诊断工具可以促进最佳的抗生素管理。在这方面,拉曼光谱学有望在一步中快速标记和无培养物鉴定以及抗菌敏感性测试(AST)。但是,尽管许多基于拉曼的细菌识别和AST研究表现出了令人印象深刻的结果,但仍必须解决一些缺点。为了弥合概念验证研究和临床应用之间的差距,我们与新的数据增强算法相结合开发了机器学习技术,以快速鉴定最小制备的细菌表型和甲氧西林抗甲氧西林(MR)的区别(MR)的区别甲氧西林敏感(MS)细菌。为此,我们为细菌的超光谱拉曼图像实施了光谱变压器模型。我们表明,我们的模型在精度和训练时间方面都超过了许多分类问题的标准卷积神经网络模型。对于六种MR-MS细菌物种,我们在数据集中达到了超过96美元的分类精度,该数据集由15个不同类别和95.6 $ \%$分类精度。更重要的是,我们的结果仅使用快速,易于生产的培训和测试数据获得
translated by 谷歌翻译